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main, where the constant domain is further character- 
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sample for use in hybridization assays, e.g., differential 
gene expression analyses. 
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Description 



[0001] The present invention relates to a method for producing a mixture of nucleic acids, and in particular oligonu- 
cleotide primers. The general field of this invention is molecular biology, and particularly gene expression analysis. 
5 [0002] The characterization of cellular gene expression (i.e.. gene expression analysis) finds application in a variety 
of disciplines, such as in the analysis of d Ifferential expression between different tissue types, different stages of cellular 
growth or between normal and diseased states. 

[0003] Fundamental to differential expression analysis is the detection of different mRNA species in a test sample, 
and often the quantitative determination of different mRNA levels in that test sample. In order to detect different mRNA 

10 levels in a given test population, a population of labeled target nucleic acids that, at least partially, reflects or mirrors 
the mRNA profile of the test sample is produced. In other words, a population of labeled target nucleic acids is generated 
where at least a portion of the mRNA species in the test sample are represented, in terms of presence and often in 
terms of amount. Following target generation, the target population is contacted with one or more probe sequences, 
e.g.. as found on an array, whereby the presence and often amount of specific targets in the target population is 

f5 detected. From the resultant data, information about the mRNAs present in the sample, i.e.. the mRNA profile and 
gene expression profile, can be readily deduced. 

[0004] A fundamental step in gene expression analysis assays is, therefore, the step of labeled target generation. 
Target generation protocols typically include a primer extension reaction, in which a primer Is contacted with an initial 
mRNA sample to produce a labeled target population, as described above. In certain protocols. polyA primers and 

20 variants thereof are employed . Disadvantages of such protocols include the inability to produce target from prokaryotic 
mRNA species that lack a potyA tail and the propensity of such protocols to produce target that lacks 5' mRNA infor- 
mation. While the use of random primers overcomes some of these disadvantages, random primer protocols suffer 
from their own disadvantages, e.g., lackof specificity resulting from Increased complexity In the primer mixture produced 
by the process, where not only mRNA is represented, but also rRNA, tRNA and snRNA. In yet other protocols, custom 

25 primer mixes are employed in target generation. While such protocols overcome the above-described disadvantages 
with polyA and random primer based protocols, custom primer mix or gene specific primer based protocols can be 
prohibitively expensive, particularly in array-based hybridization protocols in which custom arrays are employed. 
[0005] As such, there is continued interest in the development of new primer generation protocols. Of particular 
interest would be the development of a protocol that realizes the advantages of gene specific primer based protocols 

30 white at the same time is economical to perform and is therefore suitable for use in custom array-based hybridization 
assays. 



Relevant Literature 



35 [0006] See U.S. Patent No. 5,795,714 and the references cited therein. 

[0007] Methods for generating mixtures of nucleic acids, e.g.. oligonucleotide primers, are provided. In the subject 
methods, an array of probe nucleic acids is employed as template to generate mixtures of nucleic acids via a template 
driven primer extension reaction. In preferred embodiments, each probe on the array employed in the subject methods 
comprises a constant domain and a variable domain, where the constant domain is further characterized by having at 

40 least a recognition domain, and optionally a functional domain and/or linker domain. Also provided are the arrays 
employed in the subject methods and kits for practicing the subject methods. The subject methods find use in a variety 
of applications, including the generation of target nucleic acids from an mRNA sample for use in hybridization assays, 
e.g., differential gene expression analysis. 

[0008] A number of preferred embodiments of the present invention will now be disclosed, with reference to the 
45 drawings, in which: - 

Figure 1 provides a view of the stained gel produced in Example 1 of the Experimental section, infra. 

[0009] The term "nucleic acid" as used herein means a polymer composed of nucleotides, e.g., deoxyribonucleotides 
50 or ribonucleotides, or compounds produced synthetically (e.g. PNA as described in U.S. Patent No. 5,948,902 and the 
references cited therein) which can hybridize with naturally occurring nucleic acids in a sequence specific manner 
analogous to that of two naturally occurring nucleic acids. 

[0010] The terms "ribonucleic acid" and "RNA" as used herein mean a polymer composed of ribonucleotides. 
[001 1] The terms "deoxyribonucleic acid" and "DNA" as used herein mean a polymer composed of deoxyribonucle- 
55 otides. 

[0012] The term "oligonucleotide" as used herein denotes single stranded nucleotide multimers of from about 1 0 to 
100 nucleotides and up to 200 nucleotides in length. 

[001 3] The term "polynucleotide" as used herein refers to single or double stranded polymer composed of nucleotide 
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monomers of generally greater than 100 nucleotides in length. 
[0014] The term "mRNA" means messenger RNA. 

[001 5] The term "array" means a substrate having at least one planar surface on which is immobilized a plurality of 
different probe nucleic acids. 

5 [0016] Methods for generating mixtures of nucleic acids, e.g., oligonucleotide primers, are provided. In the subject 
methods, an array is employed as template to generate mixtures of nucleic acids via a template driven primer extension 
reaction. In preferred embodiments, each probe on the array employed in the subject methods comprises a constant 
domain and a variable domain, where the constant domain is further characterized by having at ieast a recognition 
domain, and optionally a functional and/or linl<er domain. Also provided are the an^ays employed in the subject methods 

10 and kits for practicing the subject methods. The subject methods find use in a variety of applications, including the 
generation of target nucleic acids from an mRNA sample for use in hybridization assays, e.g., differential gene expres- 
sion analysis. In further describing the subject invention, the subject methods will be described first, followed by a 
review of representative protocols in which the nucleic acid mixtures produced by the subject methods find use as well 
as a description of kits that find use in practicing the subject methods. 

15 [0017] Before the subject invention is described further, it is to be understood that the invention is not limited to the 
particular embodiments of the invention described below, as variations of the particular embodiments may be made 
and still fall within the scope of the appended claims. It is also to be understood that the terminology employed is for 
the purpose of describing particular embodiments, and is not intended to be limiting. Instead, the scope of the present 
invention will be established by the appended claims. 

20 [0018] In this specification and the appended claims, the singular fomris "a," "an" and "the" include plural reference 
unless the context clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as commonly understood to one of ordinary skill in the art to which this invention belongs. 
[0019] As summarized above, the subject invention provides methods for generating mixtures of nucleic acids by a 
template driven primer extension protocol in which an array is employed as template. The mixture of nucleic acids 

25 produced by the subject methods is characterized by having a known composition. As such, at least the sequence of 
each individual or distinct nucleic acid in the mixture of differing sequence is known. In many embodiments, the relative 
amount or copy number of each distinct nudeic acid of differing sequence is known. Each nucleic acid present in the 
mixture at least includes a variable domain that serves to distinguish it from any other nucleic acid in the mixture, i.e.. 
any other nucleic acid that does not have the identical sequence - any nucleic acid that is not its copy. The variable 

30 domain, Sjj. is a nucleic acid that hybridizes under stringent conditions to gene i at location j and is capable of serving 
as a primer in reverse transcription beginning at base j. The number of different variable domains, Sy, present in the 
mixture may vary, but is generally at ieast about 10, usually at least about 20 and more usually at least about 50, where 
the number may be as great as 25,000 or greater. In many embodiments, the number of different variable domains 
present in the mixture ranges from about 1,978 to 25,000, usually from about 4,200 to 8,400. In addition to the distin- 

35 guishing variable domain, the constituent members of the mixture may all share one or more domains of common 
sequence, depending on the particular protocol employed to generate the mixture, as described in greater detail below. 
[0020] In the subject methods, the first step is generally to provide an anray, I.e., a substrate having a planar surface 
on which Is immobilized a plurality of distinct nucleic acid probes, in which each probe sequence on the array includes 
a constant domain and a complement variable domain. This providing step may include either generating the array de 

40 novo or obtaining a pre-made array from a commercial source, where in either case the array will have the character- 
istics described below. Arrays of nucleic acids are known in the art, where representative an-ays that may be modified 
to become arrays of the subject invention as described below, include those described in: 5,242,974; 5,384,261; 
5,405,783; 5,412.087; 5,424,186; 5,429,807; 5,436,327; 5,445,934; 5,472,672; 5.527.681; 5,529,756; 5,545,531; 
5.554.501; 5,556.752; 5,561.071; 5.599,695; 5,624,711; 5,639.603; 5.658,734; 5,795,714; WO 93/17126; WO 

45 95/1 1 995; WO 95/35505; EP 742 287; and EP 799 897; the disclosures of which are herein incorporated by reference. 
[0021] As mentioned above, each distinct probe nucleic acid on the array includes a constant domain and a com- 
plement variable domain. The complement variable domain of each distinct probe has a sequence that is the comple- 
ment of a variable or distinguishing domain found in a constituent member of the mixture of nucleic acids that is produced 
by the subject methods as described above, where by complement is meant that the variable and complement variable 

50 sequences hybridize under stringent conditions, e.g., at 50°C or higher and 0.1HSSC (15 mM sodium chloride/1.5 mM 
sodium citrate) or thermodynamically equivalent conditions. Thus, the array includes a plurality of distinct probes that 
differ from each other by complement variable domain, where the number of distinct probes on an array employed in 
the subject methods is typically at least 10. usually at least 20 and more usually at least 50, where the number may 
be as high as 25,000 or higher. In many embodiments, the number of distinct probes ranges from about 1 ,978 to 25,000. 

55 usually from about 4.200 to 8,400. 

[0022] Because of the nature of the subject methods, as described below, each distinct complement variable domain 
will be represented in the nucleic acid mixture produced using the array, i.e., the complement of each distinct comple- 
ment variable domain sequence will be found in the mixture of nucleic acids produced by the subject methods. For 
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example, where an array has 10 different probes that differ by complement variable domain such that it has 10 different 
complement variable domains, i.e., cV^.^Q' the nucleic acid mixture produced by the subject methods as described 
below will have 1 0 different or distinct nucleic acids, where each different nucleic acid sequence In the mixture includes 
a sequence that Is the complement of one of cV^.^q, i.e., V^.-iq. 

5 [0023] The relative copy number of each probe on the array may or may not be selected to "normalize" the nucleic 
acid mixture made with the array with respect to the mRNA sample with which it is to be used. For example, if the array 
is to be used to make a nucleic acid mixture that has a 10-fold increase In the copy number of target that hybridizes 
to a rare mRNA, the copy number of the corresponding (e.g. identical or complementary) probe on the array can be 
appropriately increased relative to other probes that correspond to less rare mRNA species In the mRNA sample. In 

10 many embodiments, the complement variable domain Is a domain that has a sequence that is chosen to hybridize 
under stringent conditions to a sequence of Interest found in a particular mRNA. In many embodiments, the complement 
variable sequence has a sequence that is denoted as cS,y, where c stands for complement and S,y is a nucleic acid 
that primes reverse transcription of a gene i beginning at base j. Thus, in many embodiments of the Invention, the 
complement variable domain of each probe is the complement of a nucleic acid that Is capable of hybridizing to a 

15 different gene of Interest i at location or base j and acting as a primer under reverse transcription conditions. For 
example, where 10 different genes, I.e., genes 1 to 10 are represented on the array and the sequence of interest for 
each gene begins at base number 50, 60, 70, 80, 90, 100, 110. 120, 130 and 140, respectively (counting from the 5" 
end of the mRNA molecule), and each complement variable domain is 20 bases long, the complement variable domains 
of each distinct probe on the array, i.e.. cV^ to V^q. will be as follows: 

20 



Variable Domain 


Sequence 


cV, 


Sequence that hybridizes under stringent conditions to bases 50 to 30 of gene 1 


CV2 


Sequence that hybridizes under stringent conditions to bases 60 to 40 of gene 2 


CV3 


Sequence that hybridizes under stringent conditions to bases 70 to 50 of gene 3 


CV4 


Sequence that hybridizes under stringent conditions to bases 80 to 60 of gene 4 


CV5 


Sequence that hybridizes under stringent conditions to bases 90 to 70 of gene 5 




Sequence that hybridizes under stringent conditions to bases 100 to 80 of gene 6 


CV7 


Sequence that hybridizes under stringent conditions to bases 110 to 90 of gene 7 


cVs 


Sequence that hybridizes under stringent conditions to bases 120 to 100 of gene 8 




Sequence that hybridizes under stringent conditions to bases 130 to 110 of gene 9 


CV10 


Sequence that hybridizes under stringent conditions to bases 140 to 120 of gene 10 



[0024] While the length of the complement variable domain In the specific example provided above is 20 bases or 
residues, i.e., 20 nt, the length may vary considerably and will be chosen based on the desired length of the resultant 
40 nucleic acids in the to be produced mixture within the synthesis constraints of the subject method. Generally, the length 
of the complement variable domain will range from about 15 to 40, usually from about 15 to 30 and more usually from 
about 20 to 25 nt. 

[0025] As mentioned above, in addition to the unique complement variable domain, each probe nucleic acid present 
on the array includes a common or shared constant domain 3' of the complement variable domain. This constant 

45 domain typically ranges in length from about 20 to 50, usually from about 20 to 45 and more usually from about 25 to 
40 nt. The constant domain typically comprises at least one of the following constant sub-domains: a functional domain; 
a recognition domain and a linker domain. In many embodiments, each probe contains at least a recognition sub- 
domain, and optionally a functional domain and/or a linker domain. These constant sub-domains may be grouped 
together on the probe or separated so as to flank the variable domain of the probe. As such. In certain embodiments 

50 these sub-domains are generally arranged in the order of functional domain, recognition domain and linker domain 
going from the 5* to the 3' end of the probe sequence, such that the linker domain is at the 3' probe terminus and is 
attached, either directly or indirectly, to the substrate surface of the array. In yet other embodiments, one or more of 
the domains, e.g., the functional sub-domain, may be present on the 5' end of the variable domain. 
[0026] The optional functional sub-domain is generally a sequence that imparts or contributes some function to a 

55 duplex nucleic acid In which it is present. Functional domains of interest include: polymerase promoter sites, e.g., T3 
or T7 RNA polymerase promoter sites, sequences unique with respect to the intended target organism for the array 
experiment (i.e. unique priming sites) and the like. The length of this functional domain typically ranges from about 10 
nt to 40 nt, usually from about 20 nt to 30 nt. 
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[0027] The recognition sequence of the constant domain is typically a sequence that, when present in duplex format, 
Is recognized and cleaved by a restriction endonuclease. A large number of restriction endonucleases are known to 
those of skill in the art. Specific restriction endonuclease recognized sites of interest that may make up the subject 
recognition sequence include, but are not limited to: Hinc II and the like. Generally, the length of the recognition domain 

5 ranges from about 4 nt to 8 nt, usually from about 5 nt to 6 nt 

[0028] The linker sub-domain of the subject constant domains is optional. The linker domain may be any convenient 
sequence, including random sequence or a non-polynucleotide chemical linker (e.g. an ethylene glycol-based polyether 
oligomer), where the sole purpose of the linker domain is to project the other domains of the probe away from the 
substrate surface. Generally, the linker domain if present, has a length ranging from about 1 to 20, usually from about 

10 1 to 15 and more usually from about 1 to 10, including 5 to 10 nt. 

[0029] In many, though not all, embodiments, each surface bound probe on the array employed in the subject methods 
is described by the following formula: 

,5 surface-3'-L-R-F-cV-5' 

wherein: 

L is the optional linking domain; 
20 R is the recognition domain; 

F is the functional domain; and 

cV is the complement variable domain, i.e., the complement of the variable domain, cS,y, of the nucleic acid pro- 
duced by the subject methods to which it hybridizes under stringent conditions; 

25 where each of these elements are as described above. 

[0030] As mentioned above, the subject arrays are provided by any convenient means, including obtaining them 
from a commercial source or by synthesizing them de novo. To synthesize the arrays employed in the subject methods, 
the first step is generally to determine the nature of the mixture of nucleic acids that is to be produced using the subject 
array according to the subject methods. In those embodiments where the nucleic acid mixture is to be employed as 

30 gene specific primer in the generation of target nucleic acid, as described in greater detail below, the first step is to 
identify those genes that are to be represented by a primer in the primer mixture, i.e.. those specific mRNAs potentially 
present in the experimental samples which are to have primers in the mixture that are capable of hybridizing to them 
under stringent conditions. Following Identification of these genes, the specific region, i.e. stretch or domain, of each 
mRNA to which the primer is to hybridize is then identified. These specific domains or regions may be identified using 

35 any convenient protocol and set of selection criteria, where of interest in many embodiments is the use of the algorithm 
and selection methods based thereon described in U.S. Patent Application Serial No. 09/021,701, the disclosure of 
which is herein Incorporated by reference. As such, a plurality of different sequences of interest will be Identified, 
wherein each sequence is described by the formula Sy, where i is the gene of interest and j is the specific base at 
which the sequence starts, as described above. Following identification of each variable or Sjj sequence as described 

40 above, a probe sequence for each different variable or Sjj sequence is identified, where the probe sequence has the 
following sequence in many embodiments: 

3'-L-R-F-cV-5' 

45 

wherein: 

L is the linking domain; 
R is the recognition domain; 
50 F is the functional domain; and 

cV is the complement of the variable domain, i.e., cSff, 

where each of these elements are as defined above and each of the probes varies only in terms of its cV domain. 
[0031] Following identification of the probe sequences as defined above, an array is produced in which each of the 
55 probe sequences of the identified set is present. The array may be produced using any convenient protocol, where 
suitable protocols include both synthesis of the complement probe followed by deposition onto a substrate surface, as 
well as synthesis of the probe directly on the substrate surface. Representative protocols for array synthesis are de- 
scribed in: 5,242.974; 5,384,261; 5.405,783; 5,412,087; 5,424,186; 5.429,807; 5.436,327; 5,445.934; 5,472,672; 
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5,527,681; 5.529,756; 5,545,531; 5,554,501; 5,556.752; 5.561,071; 5.599,695; 5,624,711; 5.639,603; 5,658.734; 
5,795,714; WO 93/17126; WO 95/11995; WO 95/35505; EP 742 287; and EP 799 897; the disclosures of which are 
herein incorporated by reference. 

[0032] Following provision of the array employed in the subject methods, as described above, the next step is to 
5 contact the an-ay with universal primer under hybridization conditions sufTicient to produce a template array that includes 
a plurality of overhang comprising duplex nucleic acids on its surface, where the overhang is made up of the complement 
variable domain of each probe of the array. The universal primer is capable of hybridizing to the constant domain, or 
at least a portion thereof (e.g., at least that portion immediately 3' of the complement variable domain). The universal 
primer has a length that is sufficient to prime template driven primer extension, where the length of the universal primer 
10 generally ranges from about 10 to 45 nt. usually from about 15 to 35 nt and more usually from about 20 to 30 nt. In 
many embodiments, the universal primer is the complement of the recognition and/or functional sub-domains of the 
constant domain of each probe on the array. As such, in many embodiments the universal primer employed has a 
sequence described by the formula: 

15 

5'-cR-cF-3* 

wherein: 

20 cR is the complement of the recognition domain; and 

cF is the complement of the functional domain. 

[0033] As mentioned above, the template array produced by this method is an array of duplex probe molecules made 
up of a first nucleic acid having a constant and complement variable domain and a second nucleic acid which is the 
25 universal primer and is hybridized to the constant domain (or at least that portion of the constant domain that Is 3' of 
the variable domain complement). As such, the array produced by this step Is an array of overhang comprising duplex 
nucleic acid, typically DNA, molecules, where the overhang is made up of the complement variable domain of each 
probe on the array. 

[0034] This template array of overhang comprising duplex probes is then subjected to primer extension reaction 
30 conditions sufficient to produce the desired mixture of nucleic acids. The specific primer extension reaction conditions 
to which the template array of overhang comprising duplex nucleic acids is subjected may vary depending on the 
particular protocol used and/or the specific nature of the nucleic acid mixture to be produced therefrom. Specific primer 
extension reaction conditions of interest include, but are not limited to: linear PGR (Polymerase Chain Reaction); strand 
displacement amplification; and in vitro transcription. Each of these specific primer extension reaction conditions is 
35 now reviewed in greater detail. 

[0035] Where the template array is subjected to linear PGR conditions, the array is contacted in an aqueous reaction 
mixture with a source of DNA polymerase, dNTPs and any other desired or requisite primer extension reagents under 
conditions sufficient to produce lineariy amplified amounts of nucleic acids, e.g., under thermal cycling conditions. As 
such, the polymerase employed in the subject methods is generally, though not necessarily (e.g.. where new polymer- 
ic ase is added after each cycle) a thermostable polymerase. A variety of thermostable polymerases are known to those 
of skill in the art, where representative polymerases include, but are not limited to: Taq polymerase. Vent® polymerase. 
Pfu polymerase and the like. The amount of polymerase present in the reaction mixture may vary but Is sufficient to 
provide for the requisite amount of polymerase activity, where the specific amount employed may be readily determined 
by those of skill in the art. Also present in the reaction mixture is a collection of the four dNTPs, i.e., dATP, dCTP, dGTP 
45 and dTTP. The dNTPs may be present in varying or equimolar amounts, where the amount of each dNTP typically 
ranges from about 10 to 10 mM, usually from about 100 to 300 ^M. Other reagents that may be present in the 
reaction mixture Include: monovalent cations (e.g. Na*). divalent cations (e.g. Mg*+), buffers (e.g. Tris), surfactants (e. 
g. Triton X-100) and the like. In this linear PGR embodiment of the subject methods, the reaction mixture is subjected 
to thermal cycling conditions In which the temperature of the reaction mixture is cycled through an annealing, primer 
50 extension and dissociation temperatures in a manner that results in the production of linearly amplified amounts of 
nucleic acid for each different sequence probe on the template array. The annealing temperature typically ranges from 
about 50°G to 80°C. usually from about 60°G to 75°G and is maintained for period of time ranging from about 1 0 sec. 
to 10 min., usually from about 30 sec. to 2 min. The primer extension temperature typically ranges from about 55°G 
to 75°G, usually from about 60^*0 to 70°G and is maintained for period of time ranging from about 30 sec. to 10 min., 
55 usually from about 1 min. to 5 min.. The dissociation temperature typically ranges from about 80°C to 99°C, usually 
from about 90°G to 95°G and is maintained for period of time ranging from about 1 sec. to 2 min., usually from about 
30 sec. to 1 min. 

[0036] In strand displacement amplification, the array of overhang comprising duplex nucleic acids is employed as 
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primed template in linear amplification variations of the exponential amplification protocols described in Walker et al., 
Nucleic Acids Res. (1992) 20:1 69M 696 and Walker et a!., Proc. Nafi Acad. Sci. USA (1992) 89:392-396; as well as 
in U.S. Patent No. 5,648,211; the disclosure of which is herein incorporated by reference. Briefly, isothermal linear 
amptificatlon Is achieved as follows. Following production of the array of overhang comprising duplex nucleic acids, 

5 the template array is subjected to a cycle of strand nicking of the universal primer after sequence cR, typically by using 
a restriction endonuclease. Generally, the template strand or probe sequence is protected via an appropriately placed 
phosphorthioate linkage in the surface-bound template strand. Extension of the 3' end exposed by the nick is then 
allowed to proceed by using a DNA polymerase that lacks a 5'-^3' exonuclease activity but possesses a strand dis- 
placement activity, e.g., Klenow fragment. Each cycle in this protocol releases a nucleic acid molecule which has the 

10 formula: 5'-cF-Slj-3'. In certain variants of this method, nicking may be achieved by making R a half-site for a restriction 
endonuclease that exhibits single-strand cleavage activity, or by employing a nicking endonuclease. such as N.BstNBI, 
and the like. 

[0037] In yet other embodiments, the subject template array of duplex nucleic acids is employed in an in vitro tran- 
scription method. In this embodiment, the template array is modified from that described above to be of the following 
15 formula: 

(surface)-L-R-(C)Slj-F-5' 

20 wherein: 

L and R are as defined at)ove; 

F is an RNA polymerase promoter, e.g., T3 or T7 promoter; and 
(C) Slj Is Sij modified to end in a C residue, 

25 

[0038] The universal primer employed with this array has the formula 5'-cR-3'. When the template array is contacted 
with NTPs, T3 or T7 polymerase and the appropriate transcription buffer, rinonucleic acids of the formula 5'-(rG)rcSij- 
rcF-3' are produced, where r stands for ribonucleotide. By contacting this resultant mixture of ribonucleic acids with 
the DNA primer 5'-F-3' and a reverse transcriptase, a mixture of deoxyribonucleic acids suitable for use as primer in 

30 target generation protocols is produced. 

[0039] The subject template arrays may also be used in other nucleic acid primer extension generation protocol- 
s — the above being merely representative of the protocols in which the subject template arrays find use. 
[0040] The above described array template based primer extension generation methods result in the production of 
a mixture of nucleic acids, typically a mixture of deoxyribonucleic acids, where each of the different complement variable 

35 domains of the template array is represented in the mixture, i.e., there is at least one nucleic acid In the mixture that 
has a variable domain that hybridizes under stringent conditions to each different complement variable domain present 
on the array. The length of each of the nucleic acids present in the resultant mixture typically ranges from about 20 to 
60 nt, usually from about 25 to 55 nt and more usually from about 30 to 50 nt. Because of the manner in which the 
subject mixtures of nucleic acids are produced, the resultant mixtures of nucleic acids may be viewed as mixtures of 

40 gene specific primers, where the gene specific primers are specific for each of the different genes represented on the 
template array employed In the production of the nucleic acid mixture. In certain embodiments, the mixture may be 
"normalized" with respect to a given mRNA population, as described above. 

[0041] The nucleic acid mixtures produced by the subject methods find use in a variety of different applications, and 
are particulariy suited for use as primers in the generation of target nucleic acids, e.g., for array based differential gene 

45 expression analysis applications. Where the subject nucleic acids mixtures are used as primers for target generation 
In gene expression analyses, the first step is to generate a population of target nucleic acids from an Initial mRNA 
source or sample. By target nucleic acid is meant a nucleic acid that has a sequence, e.g., S,y, which Is either the same 
as, or complementary to. the sequence of an mRNA found in an initial sample, where the target may be DNA or RNA 
and be present In amplified amounts as compared to the Initial amount of mRNA, depending on the particular target 

50 generation protocol that is employed. 

[0042] In the subject methods, the target or image nucleic acids are produced from the subject nucleic acid mixtures 
generally through enzymatic generation protocols. Specifically, the target nucleic acids are typically produced using 
template dependent polymerization protocols and an initial mRNA source. The initial mRNA source may be present In 
a variety of different samples, where the sample will typically be derived from a physiological source. The physiological 

55 source may be derived from a variety of eukaryotic or prokaryotic sources, with physiological sources of interest in- 
cluding sources derived from single-celled organisms such as yeast and multicellular organisms, including plants and 
animals, particularly mammals, where the physiological sources from multicellular organisms may be derived from 
particular organs or tissues of the multicellular organism, or from Isolated cells derived therefrom, in obtaining the 
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sample of RNA to be analyzed from the physiological source from which it is derived, the physiological source may be 
subjected to a number of different processing steps, where such processing steps might include tissue homogenization, 
cell isolation and cytoplasm extraction, nucleic acid extraction and the like, where such processing steps are known 
to those of skill in the art. Methods of isolating RNA from cells, tissues, organs or whole organisms are known to those 
5 of skill in the art and are described in Maniatis et ai (1989). Molecular Cloning: A Laboratory Manual 2d Ed. (Cold 
Spring Harbor Press). 

[0043] A number of different enzymatic protocols for generating image or target nucleic acids from an initial mRNA 
sample are known and continue to be developed. Any convenient protocol may be employed, where the particular 
protocol employed depends, at least in part, on a number of factors, including: whether one wants to generate amplified 
10 amounts of target or image nucleic acid; whether one wants to generate geometrically or linearly amplified amounts 
of target nucleic acid; whether bias in the amount of target can be tolerated, etc. A common feature of the protocols 
that find use in preparing the image or target nucleic acids of the subject invention is the use of the subject nucleic 
acid mixtures produced using array-based template protocols described above as primer. 

[0044] A number of nucleic acid amplification methods can be employed to generate the target nucleic acid from an 
t5 initial mRNA source, where these methods can employ the subject nucleic acid mixtures as primer. Such methods 
include the "polymerase chain reaction" (PCR) as described in United States Patent Number 4,683,1 95, the disclosure 
of which is herein incorporated by reference, and a number of transcription-based exponential amplification methods, 
such as those described in U.S. Patent Nos. 5,130,238; 5,399,491; and 5,437, 990; the disclosures of which are herein 
incorporated by reference. Each of these methods uses primer-dependent nucleic acid synthesis to generate a DNA 
20 or RNA product, which serves as a template for subsequent rounds of primer-dependent nucleic acid synthesis. Each 
process uses (at least) two primer sequences complementary to different strands of a desired nucleic acid sequence 
and results in an exponential increase in the number of copies of the target sequence. 

[0045] Altematively, amplification methods that utilize a single primer may be employed to generate target or image 
nucleic acids from an initial mRNA sample, where the subject nucleic acid mixtures are employed as primer. See e.g. 

25 U.S. Patent Nos. 5,554,51 6; and 5,71 6,785; the disclosures of which are herein incorporated by reference. The methods 
reported in these patents utilize a single primer containing an RNA polymerase promoter sequence and a sequence 
complementary to the 3*-end of the desired nucleic acid target sequence(s) ("promoter-primer"). In both methods, the 
promoter-primer is added under conditions where it hybridizes to the target sequence(s) and is converted to a substrate 
for RNA polymerase. In both methods, the substrate intermediate is recognized by RNA polymerase, which produces 

30 multiple copies of RNA complementary to the target sequence(s) ("cRNA"). 

[0046] Whatever process is employed to generate the target nucleic acid, where representative protocols have been 
provided immediately above, the process may be modified to include the use of chemical analogs of nucleotides that 
have been modified to include a label moiety, e.g., an organic fluorophore, an isotopic label, a capture ligand, e.g., 
biotin, etc. As a result, the target nucleic acids produced using the subject nucleic acid mixtures as primers often are 

35 labeled, either directly or indirectly, for use in subsequent hybridization assays. 

[0047] The I above target generation protocols are merely representative and by no means inclusive of all of the 
different types of protocols in which the subject nucleic acid mixtures find use as primers. 

[0048] The resultant populations of target nucleic acids find use as, inter alia, target in hybridization assays, such 
as gene expression analysis applications. Gene expression analysis protocols are well known to those of skill in the 

40 art, and the populations of target nucleic acids produced by the subject methods find use in many, if not all, of these 
protocols. In gene expression analysis protocols using the subject populations of labeled target, the population of 
labeled target is typically contacted with a population of probe nucleic acids, e.g., on an array, under hybridization 
conditions, usually stringent hybridization conditions. The array may be the same array that is used as the template 
array or a different array. Following hybridization, non-bound target is removed or separated from the probe, e.g., by 

45 washing. Washing results in a pattern of hybridized target, which may be read using any convenient protocol, e.g., with 
a fluorescent scanner device. From this pattern, information regarding the mRNA expression profile in the initial mRNA 
sample from which the target population was produced may be readily derived or deduced. 

[0049] In certain embodiments, the subject methods include a step of transmitting data from at least one of the 
detecting and deriving steps, as described above, to a remote location. By "remote location" is meant a location other 
50 than the location at the which the array is present and hybridization occur. For example, a remote location could be 
another location (e.g. office, lab, etc.) in the same city, another location in a different city, another location in a different 
state, another location in a different country, etc. The data may be transmitted to the remote location for further eval- 
uation and/or use. Any convenient telecommunications means may be employed for transmitting the data, e.g., fac- 
simile, modem, internet, etc. 

55 [0050] Also provided by the subject invention are kits for use in preparing the subject target populations of nucleic 
acids. The kits may comprise containers, each with one or more of the various reagents (typically in concentrated form) 
utilized in the methods, including, for example, buffers, dNTPs, reverse transcriptase, etc.. where the kits will at least 
include a sufficient amount of universal primer, e.g., an amount ranging from about 25 pmol to 25 ^mol. In addition. 
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the subject kits may include an array of single stranded probe nucleic acids (or a means for producing the same) 
wherein each probe has a constant region and complement variable region, as described above. Where the kit has a 
means for producing the template array, the kit typically Includes a substrate having a planar surface, and one or more 
reagents necessary for synthesis of the probes, which may vary depending on the nature of the protocol to be used 
5 to generate the array. The kits may further include reagents necessary for producing labeled target nucleic acids, where 
such reagents may include reverse transcriptase, labeled dNTPs, etc. A set of instructions will also typically be included, 
where the Instructions may be associated with a package insert and/or the packaging of the kit or the components 
thereof. 

[0051] The following examples are offered by way of illustration and not by way of limitation. 

10 

EXPERIMENTAL 
Example 

IS [0052] In order to demonstrate the feasibility of using an oligonucleotide an^ay as a template for enzymatic polynu- 
cleotide synthesis, the following experiment was performed: 

1. An in situ oligonucleotide array was manufactured; the array contained 8455 (89 x 95) features (-100 jim 
diameter) with the following sequence: 

20 

r- 

CTTTCTJTGGMX^^ 
TTT-surface (SEQ ID NO:01) 

In the above sequence, the large dash underlines indicate the unique sequence cS,y, the small dashes indicate 
the recognition/functional sequence F-R (in this case, a T7 RNA polymerase promoter) and the continuous under- 
go line Indicates a linker sequence Q. 

2. The array was hybridized for 1 hour at 60°C to the following oligonucleotide (PT7, 250 nM) 



3 *-GATATCACTCAGCATAATGTTAAGTA-5' (SEQ ID NO:02) 

i.e. the complementary strand of the T7 promoter portion of the oligonucleotide on the surface. The purpose of 
this treatment was to produce a double-stranded T7 promoter, which is necessary for T7 RNA polymerase activity 
40 (note that a double-stranded template strand is not necessary; a 5'-overhanglng singte-stranded template is known 

to be sufficient). 

3. The anray was washed briefly with ice-cold water (to remove salts from the hybridization buffer) and blown dry 
with nitrogen. The hybridization chamber was reassembled and filled with a transcription mixture (250 ^1) containing 
45 T7 transcription buffer (including NTP's), T7 RNA polymerase, 1% Triton X-100 and the oligonucleotide of step 2 

(250 nM). The assembly was incubated overnight at 40''C. An identical positive control array was also incubated 
In contact with the same transcription mixture, with a soluble version of the an'ay-bound oligonucleotide of step 1 
added (HCV185; 250 nM). Finally, a second positive control mixture was incubated in a PCR tube. 

50 4. The transcription mixtures were removed from the experimental and positive control arrays. Half of each array 

mixture was concentrated >10x using a Microcon-3 ultrafiltration concentrator. 

5. The various samples were analyzed on a 15% polyacrylamideMM urea gel, stained with ethidium bromide and 
visualized by fluorescence. The results are provided in Fig. 1 . 

55 

[0053] The results provided in Fig. 1 cleariy show visible transcript in the concentrated experimental array sample 
(lane 2). Separate negative control experiments demonstrated that reactions which omitted the complementary oligo- 
nucleotide PT7 or the T7 RNA polymerase did not produce visible bands on a similar gel (data not shown). Microcon 
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concentration of - 80 ^il of 250 nM PT7 oligo also failed to yield a visible band on a similar gel (data not shown). Thus, 
the observed gel pattern is dependent upon the presence of T7 RNA polymerase and a double-stranded T7 promoter, 
and is not due to the added oligonucleotide PT7. Furthermore, the chief product of transcription from an array-bound 
template displays the same gel migration rate as the chief product of positive-control transcription reactions. The most 
likely explanation for the observed data Is that we have reduced to practice the T7 RNA polymerase version of enzymatic 
oligonucleotide production from an array template. 

[0054] It is evident that the subject invention provides a number of advantages over current target nucleic acid gen- 
eration protocols. These advantages include the provision of an economical and rapid synthesis method for custom 
primer mixtures that are particularly suited for use in target generation for use with the nucleic acid arrays. Using the 
subject methods leads to Increased specificity in microarray based assays. Using the subject methods, one can develop 
microarray based assays in which the microarray is customized to be sensitive or insensitive to various splicing variants 
of different genes of interest, even where the splicing variant is present proximal to the 5* end of the coding sequence. 
Allele specific mRNA profiling is possible with the subject methods by picking the variable region so that the 3*-end of 
the primer produced hybridizes at a base where the two alleles differ. In addition, the subject methods can be employed 
to easily produce normalized target nucleic acid mixtures. Accordingly, the Invention represents a significant contribu- 
tion to the art. 

[0055] All publications and patent applications cited in this specification are herein incorporated by reference as if 
each Individual publication or patent application were specifically and individually indicated to be incorporated by ref- 
erence. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an 
admission that the present invention is not entitled to antedate such publication by virtue of prior invention. 
[0056] Although the foregoing invention has been described In some detail by way of Illustration and example for 
purposes of clarity of understanding, it is readily apparent to those of ordinary skill in the art in light of the teachings 
of this invention that certain changes and modifications may be made thereto without departing from the scope of the 
appended claims. 
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SEQUENCE LISTING 
<110> Agilent Technologies, Inc. 

<120> Array based methods for synthesizing nucleic acid mixtures 

<130> IT/N11455 

<150> US 09/628,472 
<151> 2000-07-31 



<160> 2 

^5 <170> Patentin version 3.1 

<210> 1 

<211> 60 

<212> DNA 

20 <213> artificial sequence 

<220> 

<223> synthetic probe 



<400> 1 

ctttcttgga tcaacccgct caatgctccc tatagtgagt cgtattacaa ttcatttttt 60 



<210> 2 

<211> 26 

<212> DNA 

<213> artificial sequence 
<220> 

<223> synthetic probe 

<400> 2 

gatatcactc agcataatgt taagta 26 



40 



Claims 

1 . A method for producing a mixture of a nucleic acids, said metliod comprising: 

45 

(a) providing an array of distinct single-stranded probe nucleic acids of differing sequence where each distinct 
probe present on said array comprises a constant domain and a complement variable domain; 

(b) contacting said anray of single-stranded probe nucleic acids with nucleic acids complementary to said 
constant domain under hybridization conditions, whereby a template array of overhang comprising duplex 

50 nucleic acids is produced, wherein each overhang comprising duplex nucleic acid of said array comprises a 

double-stranded constant region and a single-stranded variable region overhang; and 

(c) subjecting said template array of overhang comprising duplex nucleic acids to primer extension reaction 
conditions under conditions sufficient to produce said mixture of nucleic acids; 

S5 whereby said mixture of nucleic acids is produced. 

2. The method according to Claim 1 , wherein said mixture of nucleic acids Is a mixture of deoxyribo-oligonucleotides. 
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3. The method according to Claims 1 or 2, wherein said constant domain comprises at least one domain selected 
from the group consisting of: a linker domain; a functional domain; and a recognition domain. 

4. The method according to Claims 1 , 2 or 3, wherein said step (c) comprises a protocol selected from the group 
consisting of: linear PCR; strand displacement amplification; and in vitro transcription. 

5. The method according to any of Claims 1 to 4, wherein each distinct surface immobilized single-stranded probe 
present on said array is described by the formula: 

surface-L-R-F-cV-5' 

wherein: 

L is an optional linking domain; 
R is a recognition domain; 
F is a functional domain; and 

cV is a complement domain having a sequence that hybridizes under stringent conditions to a variable domain 
of one of said distinct oligonucleotides of said plurality; and 

said nucleic acids complementary to said constant domain are of the formula: 

5N-cR-cF-3' 

wherein: 

cR is the complement of R; and 
cF is the complement of F. 

6. The method according to Claim 5. wherein said linker domain ranges in length from about 0 to 10 bases. 

7. The method according to Claims 5 of 6. wherein said functional domain is an RNA polymerase promoter domain. 

8. The method according to Claims 5, 6 or 7, wherein said recognition domain is a recognized by a restriction endo- 
nuclease. 

9. A method of making a population of target nucleic acids from an initial mRNA sample, said method comprising: 

(a) generating a mixture of nucleic acids according to the method of any of Claims 1 to 8; and 

(b) employing said mixture of nucleic acids as primers in a target generation step in which target nucleic acids 
are produced from said mRNA sample; 

whereby said population of target nucleic acids is produced. 

10. A hybridization assay comprising the steps of: 

(a) generating a set of target nucleic acids according to the method of Claim 9; 

(b) contacting said set of target nucleic acids with an anray of probe nucleic acids under hybridization conditions; 
and 

(c) detecting the presence of target nucleic acids hybridized to probe nucleic acids of said array. 

11 . An array comprising a plurality of distinct single-stranded probe nucleic acids immobilized on a surface of substrate, 
wherein each of said single-stranded probe nucleic acids is described by the formula: 

surface-L-R-F-cV-5* 
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wherein: 

L is an optional linking domain; 
R Is a recognition domain; 
F is a functional domain; and 
V Is a variable domain; 

wherein only said variable domain V Is different for each distinct single-stranded probe nucleic acid of said 

array, 

12. A kit for use in the method of Claims 1 to 9, said kit comprising: 

(a) universal primer; and 

(b) an array of probe nucleic acids or a means for producing the same. 
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FIGURE 1 




L= 77KB RN A ladder 

1= Experimental array 2 (annealed with PT7) 
2= Concentrated experimental array 2 

3= Positive control array 4 (annealed with PT7 AND PT7/HVC185 
complex in mix) 

4= Concentrated positive control array 4 
5= Positive control tube 
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